Instrument the realm-server search path with a request→response timeline (CS-11363) by habdelra · Pull Request #5098 · cardstack/boxel

habdelra · 2026-06-03T20:26:54Z

What this does

Adds server-side timing instrumentation to the realm-server's federated-search path, so we can answer where a slow _search spends its time — from request receipt to response sent ("hoof to snout") — instead of guessing.

Why

During a from-scratch index of a query-field-heavy realm, prerendered cards stall in the waiting-stability stage for tens of seconds — up to the 90s render timeout — waiting on an in-render _search round-trip (a query-backed linksToMany getter, e.g. a customer's policies field). The card's own queryLoadsInFlight diagnostics show the search in flight for ~88s, yet pg_stat_activity shows zero SQL running longer than a second. So the time is not query execution — the realm-server (a single Node process) is slow to deliver the response, and we had no way to see why.

The gap: the host-side boxel_index.diagnostics column already captures the client's view (queryLoadsInFlight/recentQueryLoads — that's how we know it waited 88s), but the server side of that same request was completely unmetered. The whole pipeline — receipt → SQL → the post-SQL loadLinks relationship/query-field assembly → the per-instance cache round-trips → JSON.stringify of the wire-format → send — emitted nothing, there was no per-request duration, and nothing linked the client's view to the server's.

How it fits together

The search request flows: prerendered host SPA issues _federated-search → realm-server handle-search → searchRealms → per-realm Realm.search → RealmIndexQueryEngine.searchCards → loadLinks → SQL. The instrumentation threads a small RequestTimings collector down that existing path (via the shared SearchOpts, alongside the job-id/priority already threaded there) and emits one line per request.

The four pieces:

Correlation id (client → server). requestIdHeader() mints x-boxel-request-id on the prerendered host's _federated-search fetch, gated on the prerender context exactly like the existing job-id / consuming-realm headers — so live SPA traffic stamps nothing. It's a join key only; no new client-side measurement (the client wait is already in the diagnostics column). The realm-server reads it back out and keys its timeline on it.
Request→response timeline. handle-search owns a RequestTimings collector (created only when a correlation id is present) and stamps parse, resolveRealms, and stringify directly; searchRealms/searchCards/loadLinks stamp sql, loadLinks, populate (the relationship/query-field assembly), and the per-instance cacheRead/cacheWrite + hit/miss counts onto the same collector. The handler emits one realm:search-timing line: req=… job=… handler=Nms parse=… resolveRealms=… sql=… loadLinks=… populate=… stringify=… | results=… cacheHit=… cacheMiss=…. The correlation id is deliberately kept out of the job-scoped search cache's key material so it can't blow up the cache.
Request duration + id in the request log. The realm:requests middleware now logs the <-- … --> round-trip dur= and the req= id — the outermost request→response bound (includes body read + send), and the join key tying the timeline line, the request line, and the client diagnostics together.
Event-loop health sampler. A periodic realm:health line (only during saturation windows) reporting event-loop lag (mean/p99/max) alongside the in-flight _search count and heap. This is the signal that confirms or refutes the prime suspect for the 88s — the single-threaded process's event loop being starved (by synchronous post-SQL serialization across many concurrent searches) so requests sit unserviced. The gap between the middleware dur= total and the handler's stage sum, read against event-loop lag at the time, localizes queue-before-handler time.

Everything is gated on the presence of a correlation id, so normal/live traffic allocates no collector and emits no extra lines. RequestTimings is threaded through runtime-common but only ever runs in the realm-server (RealmIndexQueryEngine needs the Postgres adapter the browser doesn't have), so the host is unaffected.

Test

A host integration test exercises the whole path end-to-end through the in-realm browser: the SPA mints the correlation id (real requestIdHeader()), issues a real store.search, and the test asserts the id lands in the server's realm:search-timing line (with the sql/loadLinks stage breakdown), via the realm-server mock handing the header to the real searchRealms. A second case asserts that non-prerender traffic stamps no header and emits no line.

Scope

Measurement only — this localizes where the latency is so we can fix it; it does not change the search behavior or fix the latency itself.

Linear: CS-11363

…ine (CS-11363) During a from-scratch index, prerendered cards block for tens of seconds on in-render `_search` round-trips while the SQL behind them runs in milliseconds. We could see the client wait (host-side `boxel_index.diagnostics`) but had no server-side view of where those seconds went. This adds that view: - A client-minted `x-boxel-request-id` correlation id, stamped on the prerendered host's `_federated-search` fetch (prerender-gated; live SPA traffic is untouched). Join key only — no new client measurement. - A `realm:search-timing` line keyed by that id with the request→response stage breakdown: parse, resolveRealms, sql, loadLinks, populate (relationship/query-field assembly), per-instance cache read/write + hit/miss, stringify, and result count. Emitted by `handle-search` for the full handler picture, and standalone by `searchRealms` on the host-test path. Gated on a correlation id, so normal traffic allocates nothing. - Request duration + the correlation id on the `realm:requests` log lines (the outermost request→response bound, incl. body read + send). - A `realm:health` sampler: event-loop lag (mean/p99/max) alongside the in-flight `_search` count and heap, logged only during saturation windows — the signal that confirms or refutes single-thread saturation directly. The timing collector is threaded through the shared `SearchOpts` and is a no-op without a correlation id; `RealmIndexQueryEngine` only ever runs in the realm-server, so the host stays unaffected. A host integration test exercises the whole path end-to-end: the in-realm browser mints the id, issues a real `store.search`, and the test asserts the id lands in the server's `realm:search-timing` line (and that live, non-prerender traffic stamps nothing and emits nothing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 498aab2c5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

Adds end-to-end, request-correlated timing instrumentation for the realm-server federated search pipeline so slow _search calls can be attributed to specific server-side stages (parse, realm resolution, SQL, link loading/population, cache read/write, stringify) and correlated with request logs and client diagnostics.

Changes:

Introduces a RequestTimings collector and threads it through searchRealms → Realm.search → RealmIndexQueryEngine to stamp stage durations + counters and emit a single realm:search-timing line keyed by a client-minted correlation id.
Adds a prerender-only x-boxel-request-id header minted by the host SPA and sanitized/echoed by the realm-server request middleware, including dur= in the realm:requests log for full request→response timing.
Adds a realm-server health sampler that periodically records event-loop lag + in-flight search count during saturation windows, plus integration coverage verifying header stamping and timing-line emission behavior.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
packages/runtime-common/search-utils.ts	Extends `SearchOpts` with correlation/timings, adds a testable timing-log emitter, and optionally emits a per-request timing line when it “owns” the collector.
packages/runtime-common/request-timings.ts	New request-scoped stage timer + counters with compact log formatting.
packages/runtime-common/realm.ts	Threads `timings` through realm search options into the query engine.
packages/runtime-common/realm-index-query-engine.ts	Times `sql`, `loadLinks`, `populate`, and per-instance cache read/write; increments cache hit/miss counters.
packages/runtime-common/prerender-headers.ts	Defines/sanitizes `x-boxel-request-id` to prevent log forging and provide a shared header name.
packages/runtime-common/index.ts	Exports `RequestTimings` from runtime-common’s public surface.
packages/realm-server/search-inflight.ts	New module-level in-flight `_search` counter for health sampling correlation.
packages/realm-server/middleware/index.ts	Adds `req=` correlation + `dur=` to request logs, echoes sanitized request id, and tracks in-flight search lifecycle (finish/close).
packages/realm-server/main.ts	Starts/stops the health sampler with the server lifecycle.
packages/realm-server/health-sampler.ts	New event-loop lag sampler that logs `realm:health` only during saturation windows.
packages/realm-server/handlers/handle-search.ts	Creates/threads request timing collector (when correlation id present), stamps parse/resolve/stringify, and emits the full timeline line (incl. cache-hit paths).
packages/host/tests/integration/search-correlation-id-test.gts	New integration test asserting prerender-only header stamping and server timing-line correlation.
packages/host/tests/helpers/realm-server-mock/routes.ts	Threads sanitized request id into `searchRealms` in the mock so tests observe real timing emission.
packages/host/app/services/store.ts	Adds `requestIdHeader()` to `_federated-search` requests.
packages/host/app/lib/prerender-fetch-headers.ts	Implements prerender-gated per-request correlation id minting with `crypto.randomUUID` fallback.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-06-03T20:33:02Z

Preview deployments

Host staging preview

Host production preview

Host Test Results

1 files 1 suites 1h 50m 18s ⏱️
2 933 tests 2 918 ✅ 15 💤 0 ❌
2 952 runs 2 937 ✅ 15 💤 0 ❌

Results for commit 706ed8b.

Realm Server Test Results

1 files 1 suites 11m 0s ⏱️
1 549 tests 1 548 ✅ 1 💤 0 ❌
1 640 runs 1 639 ✅ 1 💤 0 ❌

Results for commit 706ed8b.

…ed-search timing - CORS: add the correlation header to the realm-server's `Access-Control-Allow-Headers` allowlist. Without it the cross-origin preflight blocks the prerender's `_federated-search` once the host stamps the header. - Rename `x-boxel-request-id` → `x-boxel-logging-correlation-id` (and the internal field/vars to `loggingCorrelationId`, the log token to `corr=`): the old name read too much like the unrelated `x-boxel-client-request-id`, which is realm-event/write correlation, a different concern. - searchCards in-flight coalescing: exclude the per-request `timings` collector from `searchInFlightKey` so it can't perturb the dedup key, and when a follower awaits an already-running identical search, record that wait as `coalescedWait` on the follower's own collector — otherwise its timing line would show no sql/loadLinks and look misleadingly instant under concurrent search load. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

habdelra requested a review from Copilot June 3, 2026 20:27

Copilot started reviewing on behalf of habdelra June 3, 2026 20:27 View session

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread packages/runtime-common/realm.ts

Copilot AI reviewed Jun 3, 2026

View reviewed changes

habdelra requested a review from a team June 3, 2026 21:19

lukemelia approved these changes Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument the realm-server search path with a request→response timeline (CS-11363)#5098

Instrument the realm-server search path with a request→response timeline (CS-11363)#5098
habdelra wants to merge 2 commits into
mainfrom
worktree-cs-search-server-timing

habdelra commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

habdelra commented Jun 3, 2026

What this does

Why

How it fits together

Test

Scope

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview deployments

Host Test Results

Realm Server Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 3, 2026 •

edited

Loading